# INT8 Quantized Inference
Gte Multilingual Reranker Base Onnx Op14 Opt Gpu Int8
MIT
This is the quantized ONNX version of Alibaba-NLP/gte-multilingual-reranker-base, utilizing INT8 quantization, optimized for GPU, and suitable for text classification tasks.
Text Embedding Other
G
JustJaro
91
1
Qwen2.5 VL 3B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen/Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, with weights quantized to INT8 and activations quantized to INT8.
Image-to-Text
Transformers English

Q
RedHatAI
274
1
Deepseek R1 Distill Qwen 32B Quantized.w8a8
MIT
Quantized version of DeepSeek-R1-Distill-Qwen-32B, reducing memory requirements and improving computational efficiency through INT8 weight quantization and activation quantization
Large Language Model
Transformers

D
RedHatAI
3,572
11
Featured Recommended AI Models